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Abstract —An upper bound on the capacity of a cascade of non¬ 
linear and noisy channels is presented. The cascade mimics the 
split-step Fourier method for computing waveform propagation 
governed by the stochastic generalized nonlinear Schrodinger 
equation. It is shown that the spectral efficiency of the cascade 
is at most log(l+SNR), where SNR is the receiver signal-to- 
nolse ratio. The results may be applied to optical fiber channels. 
However, the definition of bandwidth is subtle and leaves open 
interpretations of the bound. Some of these interpretations are 
discussed. 


I. Introduction 


II. Preliminaries 

A. Proper Complex Random Variables 

Let j = v/—1 and consider a complex random column vec¬ 
tor X = 2Lc~^j2Ls where X_^ and are real random column 
vectors. The covariance and pseudo-covariance matrices of 2L 
are defined as the respective 

Qx =E[(X-E[X])(X-E[X])t] (1) 

Qx =E[(X-E[X])(X-E[X]f] (2) 


The capacity of the optical fiber channel seems difficult 
to compute or even bound. Perhaps the best-known capacity 
lower bound for optical fiber networks is given in m. Many 
follow-up papers have suggested modifications of this bound, 
e.g. see [|3, 0, IS), lH, Q, iS], Q and references therein. 
The purpose of this paper is to develop a simple capacity 
upper bound for a class of channels that is sometimes used 
to simulate signal propagation. As far as we know, this is the 
first capacity upper bound for the optical fiber channel. The 
bound is based on two basic tools; maximum entropy under a 
correlation constraint and the entropy power inequality (EPI). 
The main insight is that the non-linearity that is commonly 
used to model optical fiber propagation does not change the 
differential entropy of a signal. 

The capacity bound can be converted to a spectral efficiency 
bound by normalizing by an appropriate bandwidth. We cau¬ 
tion, however, that it is not clear what the “right” choice 
for normalization should be. We therefore consider several 
options that may or may not be satisfactory for the engineering 
problem. We discuss extensions to the basic model that might 
help to clarify the issue. 

This paper is organized as follows. In Sec. |II] we review 
basic results on complex random variables and their entropy. In 
Sec.|III]we consider continuous and discrete signal propagation 
models for optical fiber. In Sec. |IV] we develop an upper 
bound on capacity for the discretized model. In Sec. lY] we 
discuss subtleties concerning how to normalize capacity to 
compute a bound on spectral efficiency. We further outline 
some extensions. Sec. I VII concludes the paper. 

We remark that the methods presented here were recently 
adapted to an optical fiber model that is based on Hamiltonian 
energy-preserving dynamical systems Eol. 


where and X^ are the respective transpose and complex- 
conjugate transpose of X. The complex vector X is said to 
be proper if Qx = 0 (see ifTTll '). It is known that a linear or 
affine transformation of a proper complex X is also proper uni 
Lemma 3]. 


B. Differential Entropy 

Let h{X) = h{X_^X_g) be the differential entropy of X . 
Two basic properties of h{-) are the translating and scaling 
properties; for a complex vector y_ and a complex square matrix 
M we have 


KX + v)=h{X) (3) 

ft.(MX) = /i(X) -1- 2 log I det(M) | (4) 

where det(M) is the determinant of M and |X| is the absolute 
value of X. Eor instance, if M is unitary (M“^ = then 
we have | det(M)| = 1 and hCsAX) = h(X). We remark that 
real random vectors and real matrices do not have the factor 
2 in dUi, see Ol Eq. (13)]. 


C. Discrete Fourier Tranfonn 

The discrete Fourier transform (DFT) of a L x 1 vector a 
is A = Fa where F is the L x L discrete Fourier transform 
(DFT) matrix with entries 


1 

7l 


^-j2Trem/L 


0 < f, m < L — 1. 


Observe that F is unitary (F ^ = F^). The inverse Fourier 
transform (IDFT) of A is a = F^ A. 


D. Maximum Entropy 


A useful property of h{-) is a maximum entropy result 
proved in Thm. 2]: for a L x 1 complex vector with 
nonsingular correlation matrix R(X) := E X we have 


h{2Q < log[(7re)^ det(R(X))] (5) 


with equality if and only if 2L E proper complex, Gaussian, 
and zero mean. 


E. Entropy Power 

The entropy power of a real, random vector X of length 
L is defined as V(X) = ^/^/(27re). But a complex 

vector X of length L can be considered to be a real vector of 
length 2L, so when dealing with complex vectors we instead 
use the definition V{2Q = e^®/^/(7re). So consider two 
independent, complex, random vectors X and Y_ of length L. 
The EPI states that ifT^ Sec. 17.8] 

V{X + Y)>V{X) + V{Y). (6) 


III. Signal Propagation 


A. Continuous Space-Time Equations 

Suppose the signal a{z,t) represents the optical field at 
location z and time t. The location z = 0 usually represents 
the launch position, i.e., the launch signal is a(0,f). We take 
the receive signal to be at position z*, i.e., the receive signal 
is a{z*,t). For ideal distributed Raman amplification (see ||T] 
Sec. IX.B]) the evolution of a(z, t) is given by the generalized 
nonlinear Schrodinger equation (see m eq. (70)1) 


da 

dz 


.h d'^a 


j 7 |apa = n 


(7) 


where j = \/—l and n is a Gaussian noise process that 
is spatially white and bandlimited to Bn Hertz. In other 
words, the noise spatial and temporal autocorrelation function 
is (see HI eq. (53)]) 

E [n{z, t)n{z', t')*] = 6{z — z') i?„sinc {Bn{t — t')) 

( 8 ) 


steps from position Zk to Zk+i for A: = 0,1,..., AT — 1. More 
precisely, the signal evolution from position Zk to position 
Zfc+i is computed by “splitting” the linear and nonlinear steps. 

1) Nonlinear step. Compute the effect of nonlinearity via 

a^(zfc+i) = Dat a(zfc) (9) 

where Djv is a diagonal matrix with entries 

gJ7|a(z,.t,)|2A,^ £= 0, 1,...,L- 1 (10) 

and where a{zk,tt) is the (£ + 1) entry of a{zk)- 

2) Linear step. Use the DFT to compute Aj^{zk+i) = 
F apf{zk+i)- Next compute the effect of dispersion via 

A^(zfc+i) = Di, A^(zfe+i) (11) 

where is a diagonal matrix with entries 

g-j(/32/2) F/{LAtf £ = 0, 1, . . . , L/2 - 1 (12) 

g-j(/32/2)(L-r)V(iAt)"A,^ £ = L/2,...,L- 1 (13) 

where we assumed that L is even. Finally, use the IDFT 
to compute ai^{zk+i) = Y Ai^{zk+i)- Summarizing, 
the linear step has input ajq{zk+i) and output 

a^Zk+i) =7^ TtLE aj^{zk+i). (14) 

3) Noise step. Add noise whose variance is proportional to 
the space step A^, the time step A^, and the noise band¬ 
width Bn- We assume that the simulation bandwidth 
B = 1/At satisfies B -C Bn- We compute the effect 
of noise via 


a(zfc+i) = ai^{zk+i) + n{zk+i) (15) 

where the entries of the L x 1 column vector ri{zk+i) are 
drawn independently from a proper complex Gaussian 
distribution with variance {NaseBu/ z*)AzAt- 
In summary, one step in space requires computing 

a(zfc+i) = F't' Dl F Dat a{zk) + n{zk+i)- (16) 


where S{x) is the Dirac-delta generalized function and 
sinc(a;) = sin(7rx)/(7ra;). 

B. Discrete Space-Time Equations 

The evolution of a{z,t) in © is often computed by us¬ 
ing the split-step Fourier method. This method discretizes 
both space and time, i.e., z takes on values in the set 
Z — {zq, zi,..., Zk} and t takes on values in the set 
T = {fo: ti, - - - Al-i}- Usually the space and time values are 
chosen to be uniformly-spaced, i.e., Zk = A^k and t£ = At£ 
for some constants A^ and A* and integers k = 0,1,..., AT, 
£ = 0,1,...,A — 1. We will use this simplification below, 
although a more general approach with non-uniform spacing 
is possible. We write a(zk) for the A x 1 vector of sample 
values a{zk,ti)^ .( = 0,1,..., A — 1, at position Zk- 

The evolution of a(z) from position z = 0 to position 
z = z* is performed by recursively computing AT “small” 


Although this equation looks linear, the nonlinearity arises 
because Dat depends on the |a(zfc,f^)P for all £- 

We remark that several split-step methods can be used, e.g., 
one can use two (fine) linear steps and one nonlinear step. The 
motivation for doing this is to improve numerical accuracy 
and/or speed up simulations. The choice of method does not 
affect the results below. 

IV. Capacity Bound 

We develop an upper bound on the mutual information 
/(a(0); a(z*)) between the channel input and output signals. 
The bound uses two basic ideas. 

1) Compute the energy of the output signal and apply the 
maximum entropy bound Q. 

2) Show that the nonlinear step does not change entropy 
and apply the EPI (|6]l. 





A. Output Energy 

Consider the space step from position to position 
The correlation matrix R(a^( 2 :fe+i)) of aj^{zk+i) has as 
(f, m) entry the value 


E 

E 


[aAr(zfe+i, f^)oAr(zfe+i, frn)*] = 


(17) 


Observe that the m = i entries do not change, i.e., the diagonal 
of R(ajY(zfc+i)) is the same as the diagonal of R(a( 0 /c)). We 
thus have Tr (R(ajv{2fe+i))) = Tr (R(a( 2 :fc))), where Tr (M) 
is the trace of the square matrix M. Next, we compute 

R(a^(zfc+i)) = FtDiFR(ajv(2fe+i))FtD[F. (18) 


We thus have Tr (R(aj;,( 2 ;/c+i))) = Tr (R(a^(zfe+i))) by 
repeatedly using Tr (AB) = Tr (BA). Finally, we have 

R(a(zfe+i)) = R{a^{zk+i)) + I (19) 

z 

where I is the L x L identity matrix. 

Combining the above results, we have 


Tr (R(a(zx))) = Tr (R(a(zo))) + TAASE-B„At 
= Eq + NaseBuT (20) 


where Eq = Tr(R(a(0o))) is the input signal energy and 
T = LAt is the total time. We further have 

(“) / ^ 

logdetR(o( 2 ;/f)) < log ( R*,i(a(zA:)) 

\i=i 
L 

= '^log R^^i{a{zK)) 

i=l 
(b) 

< Llog(Tr(R(a(z_R-)))/L) 

= L log {{Eq + AaseB„T)/L) (21) 

where (a) follows by defining Ri,i{a{zK)) = 
E [|a(z_R',fi-i)P] as the {i,i) entry of R(a( 2 ;A-)) and 
applying Hadamard’s inequality E Sec. 17.9], and (6) 
follows by Jensen’s inequality. Using Q, we thus have the 
entropy upper bound 

h{a{z*)) < L log {ire {Eq + AaseB^T) /L) . (22) 

B. Entropy Preservation 

The linear step preserves entropy because Di, is a unitary 
matrix. For the nonlinearity, observe that every entry of 
aj^{zk+i) has the form \a\e^ where arg(a) is the 

phase of a and / is a smooth function. We compute 

*■= /i(|a|,arg(a) +/(|a|) mod 27r) +E[log|a|] 

= h (|a|) + h (arg(a) + /(|a|) mod 27r||a|) + E [log jaj] 

= h (jaj) + h (arg(a) mod 27r| jaj) + E [log |a|] 

= h (a) (23) 


where (a) follows by ifOl eq; (318)], (6) follows by the chain 
rule for entropy, and (c) follows by Q. The above steps 
remain valid with conditioning. We thus have 

HoLN{zk+i)\a{0)) 

L-l 

= ^ h{aNizk+i,te)\a{0), aNizk+i,to), aAr(zfe+i, 

£=0 

L-l 

= ^ h{a{zk,te)\a{0), a{zk,to), ■■■, a{zk, k-i)) 
e=o 

= h{a{zk)\a{0)) (24) 

where (a) follows because there is an invertible transformation 
from aNizk+i,te) to a{zk,te) for all I, and hence we can 
exchange these values when conditioning. 

The above results imply that 

U(a(2;fe+i)|a(0)) 

= V (aL(zk+i) + n(zk+i) I a( 0 )) 

> V(a^(zk+i)la(0)) + U(n(2;fc+i)|a(0)) 

= 'l^(a(^fc)|a(0)) + (AASEB„/z*)A,At (25) 

where (a) follows by I©. By induction, we thus have 
V{a{zK)\a{0)) > NASEBnAt and therefore 

( 1 ( 0 ( 2 ;*)|a(0)) > LloglneNASEBnT/L). (26) 

Combining the results (122]) and (i26]) . we have 

/(a(0);«(.-'))<ilog(l + ^^-||^), (27) 

V. Discussion 

A. Spectral Efficiency 

The bound dZTl l normalized by the time T = LjB states 
that the capacity is upper bounded as 

C < B log (1 + SNR) bits/s (28) 

where SNR = EqI^NaseBtiT) is the signal-to-noise ratio. 

To bound the spectral efficiency, we must normalize by 
the bandwidth. How to define bandwidth precisely is open to 
interpretation, but suppose for now that a(zfc) has bandwidth 
W{zk)- For example, W{zk) could be the smallest bandwidth 
for which the simulations do not substantially reduce the 
mutual information /(a( 2 :o); a(zfc)). Another approach is to 
choose W{zk) large enough to ensure that the “out-of-band 
interference” is “sufficiently weak”, since interference plays a 
major role when sharing spectrum. 

We use the following approach. The maximal signal band¬ 
width is JU = maxo^fc^tf W{zk) and we define the spectral 
efficiency to be 

C < -^ log (1 + SNR) bits/s/Hz. (29) 

The best bound follows by choosing B = W. We feel that 
this is the “right” approach because the frequency band with 
bandwidth W carries (almost) all the mutual information, and 
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because one usually assumes that only the signals of interest 
are present inside this band. The spectral efficiency is thus 
bounded by log(l + SNR). 

It remains to be seen whether our approach will be generally 
accepted. In any case, the upper bound ( l28l l on capacity 
remains valid. As far as we know, this is the first such bound 
for optical fiber channels. 

B. Extensions 

There are several possible extensions of the results. First, 
observe that for the nonlinear step the phase can be any 
function of the amplitude without changing (|2^ . One will 
usually choose a smooth function so that the discrete version 
of the problem matches the continuous version. 

Second, observe that for the linear step one may choose 
any unitary transform, i.e., any all-pass hlter. For instance, the 
bound derived above remains valid for third-order dispersion. 

Third, one may wish to study a model where Raman 
amplification takes place in the C-band (see IB Fig. 21]) 
but advanced receivers collect and process signals both inside 
and outside this band. Outside the C-band, the signal will 
experience loss of at least 0.2 dB/km which is at least 20 
dB for 100 km. However, the noise outside the C-band is 
weak so that out-of-band processing could be interesting. In 
this case, the noise step (fTSt should be modified to include 
a frequency-dependent loss and noise variance. This model 
is realistic for heterogeneous systems where amplihcation is 
frequency dependent, and where receivers process outside of 
the bandwidth of their signals of interest. We hope that more 
detailed models such as this one could help to clarify what a 
proper definition of bandwidth might be. 

Finally, the bound extends to problems with polarization, 
core, and mode multiplexing, as long as the linear and non¬ 
linear steps preserve energy and entropy, and the noise is 
additive, independent, and Gaussian. 

VI. Conclusions 

The spectral efficiency of a cascade of nonlinear and noisy 
channels was shown to be bounded by log(l + SNR). The 
definition of spectral efficiency is subtle, however, because the 
notion of bandwidth is subtle, e.g., see |T4|. In fact, a recent 
paper states that the spectral efficiency of the optical 
fiber channel can be larger than log(l + SNR) when one 
normalizes by the input bandwidth W{zq) (see the text after 
(16) in ifTSlU . Of course, W(zo) is generally smaller than the 
maximal bandwidth W. This observation is important because, 
among other considerations, the maximal bandwidth dehnes 
the required spectral spacing when using wavelength-division 
multiplexing (WDM). 
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